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Abstract 

In this paper, we establish the convergence properties for a majorized alternating direction 
method of multipliers (ADMM) for linearly constrained convex optimization problems whose 
objectives contain coupled functions. Our convergence analysis relies on the generalized Mean- 
Value Theorem which plays an important role to properly control the cross terms due to the 
presence of coupled objective functions. Our results in particular show that directly applying 
2-block ADMM with a large step length to the linearly constrained convex optimization problem 
with a quadratically coupled objective function is convergent under mild conditions. We also 
provide several iteration complexity results for the algorithm. 
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1 Introduction 

Consider the following convex optimization problem: 

min 6 {u,v) := p{u) + q{v) + (p{u,v), 

U,V 

s.t. A*u + B*v = c, 


( 1 ) 


where p : lA ^ (—00,00], g : V —>• (—00,00] are two closed proper convex functions (possibly 
nonsmooth), cp : U x V ^ (—00,00) is a smooth convex function whose gradient mapping is 
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Lipschitz continuous, A'. X ^lA and B ■. X are two given linear operators, c £ X is a given 
vector, and IA,V and X are three real finite dimensional Euclidean spaces each equipped with an 
inner product (•, •) and its induced norm || • ||. 

Many interesting optimization problems are of the form ([1]) . One particular case is the following 
problem whose objective is the sum of a quadratic function and a squared distance function to a 
closed convex set; 


min 

s.t. 



A*u + B*v = c, 


( 2 ) 


u G JC2, V G /C3, 


where p > 0 is a penalty parameter, Q:Z^xV—)-Z^xVisa self-ajoint positive semidefinite linear 
operator, /Ci C x V, /C 2 ^ ^ and /C 3 C V are closed convex sets and nyci(-, •) denotes the metric 
projection onto /Ci. 

One popular way to solve problem ([1]) is the augmented Lagrangian method (ALM). Given the 
Lagrangian multiplier x £ X oi the linear constraint in ([T]), the augmented Lagrangian function 
associated with the parameter a > 0 is defined as 

Ccj{u,v,x) = 6{u,v) + {x,A*u + B*v — c) + ^\\A*u + B*v — c\\‘^, {u,v)£UxV. (3) 

The ALM minimizes Cij{u,v;x) with respect to {u,v) simultaneously regardless of whether the 
objective function is coupled or not before updating the Lagrangian multiplier x along the gradient 
ascent direction. Numerically, however, to minimize Ca{u,v,x) with respect to {u,v) jointly may 
be a difficult task due to the non-separable structure of combined with the nonsmoothness 

of p{-) and q{-). 

When the objective function in ([T]) is separable for u and v, one can alleviate the numerical 
difficulty in the ALM by directly applying the alternating direction method of multipliers (ADMM). 
The iteration scheme of the ADMM works as follows: 

y^k+i _ argmin£o-(^^, x^)i 

U 

< = aj:gmmCa-{u^~^^,v,x^), (4) 

V 

xk+l _ gjfc _|_ — c). 


where r > 0 is the step length. The global convergence of the ADMM with r G (0, and 

a separable objective function has been extensively studied in the literature, see, for examples, 
ig 0 El El in. For a recent survey, see Eckstein and Yao [S]. Although it is possible to apply 
the ADMM directly to problem (HI) even if is not separable, its convergence analysis is 

largely non-existent. One way to deal with the non-separablity of •) is to introduce a new 


variable w = 


u 


By letting A = 


with identity maps Xi 
equivalently as 


U 


U and X‘ 



B = 



C = 



and c = 


V, we can rewrite the optimization problem 



min 9{u,v,'w) := p{u) + q{v) + 4>{w), 

u,v,w 

s.t. A*u +B*v+ C*w = c. 


( 5 ) 
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For given o" > 0, the corresponding augmented Lagrangian function for problem ([S]) is 

Lfj{u, V, w; x) = 6{u, v, w) + (x, A*u + B*v + C*w — c) + — + B*v + C*w — c|P, 

where {u,v,w) € U x V x {U x V) and x € X. Directly applying the 3-Block ADMM yields the 
following framework: 

= QxgmmCcr{u,v^ 

U 

yk+i _ argmin£o-('U^’'"^, 

< ^ _ 

= argminCaiu^~^^,v^~^^,w;x^), 

W 

_ = x^ + Ta{A*u^^^ + — c), 

where r > 0 is the step length. Even though numerically the 3-block ADMM works well for many 
applications, generally it is not a convergent algorithm even if r is as small as 10“® as shown in the 
counterexamples given by Chen et al. [T]. 

In this paper, we will conduct a thorough convergence analysis about the 2-block ADMM when 
it is applied to problem ([1]) with non-separable objective functions. Unlike the case with separable 
objective functions, there are very few papers on the ADMM targeting the problem ([T|) except for 
the work of Hong et al. where the authors studied a majorized multi-block ADMM for linearly 
constrained optimization problems with non-separable objectives. When specialized to the 2-block 
case for problem ([T]), their algorithm works as follows: 

= argmin{p(M) -|- {x^,A*u) + hi{u-,u^,v^)}^ 

U 

< = aTgmm{q{v) + {x^ ,B*v) + h 2 {v;u^~^^ ,v^)}, (6) 

V 

xk+i _ _|_ af^a{A*u^~^^ + B*v^~^^ — c), 

where hi{u]u^,v^) and h 2 {v]u^~^^,v^) are majorization functions of (l){u,v) + + B*v — c|p 

at and respectively and a*; > 0 is the step length. Hong et al. [12] provided a 

very general convergence analysis of their majorized ADMM assuming that the step length ak is a 
sufficiently small fixed number or converging to zero, among other conditions. Since a large step 
length is almost always desired in practice, one needs to develop a new convergence theorem beyond 
the one in [T2|. Similar to Hong et al.’s work our approach also relies on the majorization 
technique applied to the smooth coupled function (/>(•, •). One difference is that we majorize v) 
at {u^,v^) before the {k + l)th iteration instead of changing the majorization function based on 
when updating as in dGj). Interestingly, if merely consists of quadratically 

coupled functions and separable smooth functions, our majorized ADMM is exactly the same as 
the one proposed by Hong et al. under a proper choice of the majorization functions. Moreover, 
for applications like (l2|) , a potential advantage of our method is that we only need to compute the 
projection •) once in order to compute V0(-, •) as a part of the majorization function within 

one iteration, while the procedure ([6|) needs to compute Hyci (•, •) at two different points and 

In the subsequent discussions one can see that by making use of nonsmooth analysis, 
especially the generalized Mean-Value Theorem, we are able to establish the global convergence 
and the iteration complexity for our majorized ADMM with the step length r € (0, )• To the 
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best of our knowledge, this is the first paper providing the convergence properties of the majorized 
ADMM with a large step length for solving linearly constrained convex optimization problems with 
coupled smooth objective functions. 

The remaining parts of our paper are organized as follows. In the next section, we provide some 
preliminary results. Section 3 focuses on our framework of a majorized ADMM and two important 
inequalities for the convergence analysis. In Section 4, we prove the global convergence and several 
iteration complexity results of the proposed algorithm. We conclude our paper in the last section. 


2 Preliminaries 


In this section, we shall provide some preliminary results that will be used in our subsequent 
discussions. 


Denote re ^ J . Since is assumed to be a convex function with a Lipschitz continuous 

gradient, V0(-) is globally Lipschitz continuous and V^(/>(-) exists almost everywhere. Thus, the 
following Clarke’s generalized Hessian at given re G x V is well dehned [2]: 


(9^0(u;) = conv{ lim exists}. 


( 7 ) 


where “conv{5}” denotes the convex hall of a given set S. Note that W is self-adjoint and positive 
semidehnite, i.e., W ^ 0, for any >V G w ^ lA x V. In |11] . Hiriart-Urruty and Nguyen 

provide a second order Mean-Value Theorem for (/>, which states that for any w' and w uiU xV, 
there exists 2 ; G \w',w\ and >V G d‘^(j){z) such that 

4>{w) = 4>{w') -|- {V(l){w'),w — w') + ^{w — w', W{w — w')), 

where [w',w] denotes the line segment connecting w' and w. 

Since V(/> is globally Lipschitz continuous, there exist two self-adjoint positive semidehnite 
linear operators Q and 'H'.UxV^UxV such that for any w G U x V, 

Q^W^Q + n VLVgS^^H. (8) 


Thus, for any w,w' gU x V, we have 

(f){w) > 4>{w') + {V(f){w'),w — w') + 2 11^^ “ ^lls (^) 

and 

4>iw) < (f>{w,w') := (j){w') + {V4i{w'),w - w') + -\\w' - w\\Q_^_y^. (10) 

In this paper we further assume that 


n = Diag(T>i,D2), 


( 11 ) 


where Vi \ U ^ U and D 2 : V —)• V are two self-adjoint positive semidehnite linear operators. In 
fact, this kind of structure naturally appears in applications like ([2|), where the best possible lower 
bound of the generalized Hessian is Q and the best possible upper bound of the generalized Hessian 
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is Q+I, where Z :hl xV xV \s the identity operator. For this case, the tightest estimation 
of % is X, which is block diagonal. 

Since the coupled function 0(u, v) consists of two block variables u and v, the operators Q 

Wii, Qii \U^U and VV’22, Q22 : V —)• V are self-adjoint positive semidefinite linear operators, 
and VV’12, Qi 2 '■ V ^ U are two linear mappings whose adjoints are given by yV’j^2 QI 2 , 
respectively. Denote i] G [0,1] as a constant that satishes 

\{u,{Wi2-Qi2)v)\<^{\\u\\l^ + \\v\\l^) Gd^cP{u,v), uGU, V GV. (12) 

Note that (| 12 l) always holds true for rj = 1 according to the Cauchy-Schwarz inequality. 

In order to prove the convergence of the proposed majorized ADMM, the following constraint 
qualification is needed: 

Assumption 2.1 There exists {u,v) G ri (dom(p) x dom(( 7 )) such that A*u + B*v = c. 

Let dp and dq be the subdifferential mappings of p and g, respectively. Define the set-valued 
mapping T by 

“h •jA.x \ 

) , {u,v,x) gU xV X X . 

dq{v) + Bx J 

Under Assumption 12.11 (n, v) is optimal to ([I]) if and only if there exists x G X such that the 
following Karush-Kuhn-Tucker (KKT) condition holds: 


Wii W12 

WI 2 >V22 


, where 


and W can be decomposed accordingly as Q = 


Qii Qi2 

Qi2 Q22 


and W 


0 e F{u, V, x), 
A*u + B*v = c. 


(13) 


which is equivalent to the following variational inequality: 


{p{u) + q{v)) — {p{u) -|- q{v)) + {w — w, V4>{w)) + {u — u, Ax) + {v — v, Bx) 
— {x — X, A*u + B*v — c) > 0 V(u, v,x) gU xV X X. 


(14) 


Motivated by Nesterov’s definition of an e-approximation solution based on the first order optimality 
condition m Definition 1] , we say that (u, u, x) G U x V x X is an e-approximation solution to 
problem ([T]) if 


{p{u) + q{v)) — {p{u) + q{v)) + {w — w, V4>{w)) + {u — u, Ax) + {v — v, Bx) 
— {x — x,A*u + B*v — c) < e V(u, v, x) G B{u, v, x), 


(15) 


where B{u,v,x) = {{u,v,x) gU x V x X\\\{u,v,x) — (n,u,x)|| < 1}. 

Furthermore, since p and q are convex functions, dp{-) and dq{-) are maximal monotone oper¬ 
ators. Then, for any u,u G dom(p), ^ e dp{u), and ^ G dp{u), we have 

(u-u,^-|)>0 , (16) 


and similarly for any v,v G dom(( 7 ), ( G dq{v), and C £ dq{v), we have 


{v - v,C - C) >0- 


(17) 
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3 A majorized ADMM with coupled objective functions 


In this section, we will first present the framework of our majorized ADMM and then prove two 
important inequalities that play an essential role for our convergence analysis. 

Let cj > 0. For given w' = {u', v') € V, define the following majorized augmented Lagrangian 
function associated with ([T]): 

Ca{w, {x, w')) := p{u) + q{v) + ${w] w') + {x,A*u + B*v — c) + + B*v — c|p, 

where {w, x) = {u,v,x) ^ U x V x X and the majorized function (j) is given by m- Then our 
proposed algorithm works as follows: 

Majorized ADMM: A majorized ADMM with coupled objective functions 

Choose an initial point {vP, v^,x^) £ dom(p) x dom(( 7 ) X A and parameters r > 0. Let S and T be 
given self-adjoint positive semidehnite linear operators. Set k := 0. Iterate until convergence: 

Step 1 . Compute = argmin{£o-(^i) (x^, w^)) + 

ugu 2 

Step 2 . Compute = argmin{£o-(u^+^,u; [x^,w^)) -|- -||u — u^||^}. 

nsv 2 

Step 3 . Compute x^^^ = x^ + Ta{A*vP^^ + — c). 


In order to simplify subsequent discussions, for A: = 0,1, 2, ■ 


define 


^k+l ._ — c)j 

“fc+1 := ||u^+^ - v^|||,2+r+ 


0 fc+i := — vPWi + 


Ffc+i := Qk+i + min(r, I -|- r - 


IT ' 


(18) 


fc ||2 


IctBB 


Q’ 

* — — u 


, fc ||2 


\r)'Di 


I „.fc+l 112 

T -X \\yT>2 


and denote for {u,v,x) xV x A, 

^k{u,v,x) := {Tcr)-^\\x^ - xf + \\u^ - + \\v^ - + \\\w^ - w^Q 

< +a\\A*u + B*v^ - cf, (19) 

, 'kfc(u,t!,a;) := $fc(u, u, x)-|-||u;^ — rt;||g + max(l — r, 1 — r“^)(7||A*u^-|-— c|p. 


Proposition 3.1 Suppose that the solution set of problem m is nonempty and Assumption \2.1\ 
holds. Assume that S and T are ehosen sueh that the sequence {{u^,x^)} is well defined. Then 
the following conclusions hold: 
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(i) For T G (0,1], we have that for any k >0 and {u,v,x) G U x V x X, 

+ q{v^~^^)) — {p{u) + q{v)) + — w, V(l){w)) + — u, Ax) + — v, Bx) 

- x,A*u + B*v - c) + ^{^k+i{u,v,x) - ^k{u,v,x)) 

< --(0fc+i + a\\A*u’^+^ + B*v^ - cf + (1 - T)a\\A*u^+^ + B*v’^+^ - cf). 

^ ( 20 ) 
(a) For T >0, we have that for any k > 1 and {u,v,x) gU x V x X, 

{p{u’^~^^) + q{v^~^^)) — {p{u) + q{v)) + — w, V(j){w)) + — u, Ax) + — v, Bx) 

-{x’^^^ - x,A*u + B*v -c) + ^{^k+i{u,v,x) + Sfc+i - {'ifk{u,v,x) + Ek)) 

< --{Tk+i + min(l, 1 + r-i - T)a\\A*u^+^ + B*v^+^ - cf). 

(21) 


Proof. In the majorized ADMM iteration scheme, the optimality condition for is 

' 0 G + Vu4>{w^) + Ax^ + aA{A*u^+^ + B*v^ - c) + (Qn +Vi+ S){u^+^ - u^), 

< 0 G dq{v'^+^) + Vy(j){w^) + Bx^ + aB{A*u^+^ + B*v^+^ - c) + (Q 22 +V 2 + T){v^+^ - v^) 

+Q\2{u'^^^ 

(22) 

Denote 

( “ (1 “ T)a{A*vf~^^ + B*v^^^ “ c) — aB*{v^ — u^"''^), 

\ 6^+1 =-x^+^-{l-T)a{A*u’^+^ + B*v'^+^-c). 

Then by noting that 

^k+i Ta{A*u^+^ + B*v'^+^ - c), 

we can rewrite ()22p as 

r Aa^+^ - Vucpiw'^) - (Qii +Vi+ S){u^+^ - u^) G dp{u'‘+^), 

{ (23) 

[ ^6^+1 - Vy(f{w^) - (Q22 +V2+ r){v^+^ - v’^) - Ql2{u^^^ - u^) G dq{v^+^). 

Therefore, by the convexity of p and g, we have that for any u GlA and v gV, 

( p{u) > p{u^^^) + {u — — Vuffiw^) — (Qii +Vi+ S){u^^^ — u^)), 

1 q{v) > q{v'^+^) + {v- v^+^,Bb'^+^ - V^cfiw^) - (Q22 +V 2 + T){v^+^ - v^) - Ql 2 iu^^^ - u^))- 

(24) 

By noting the relationship between and x^, we obtain from the above inequalities that 
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for any (u, v) GU xV, 


(p(^fc+i) _|_ _ (p(it) + q{v)) + — w, V(j){w)) + — u, Ax) + — v, Bx) 

_l^^k + l _ _j_ Q*y _ 

< a{B*{v'^+^-v^),A*{u^+^ -u)) - {w^+^ - w,V(l){w^) -V(l){w)) - {Ql 2 {u'^+^ - u'^),v^+^ - v) 
-{{Qii +Vi+ 5)(«^+i - u’^),u’^+^ -u)- ((Q22 +V 2 + r)(z;^+^ - - v) 

— (t(t)“^(x^+^ — — x) — (1 — T)a\\A*u^^^ + — c|p. 

(25) 

By taking {w,w') = {w,w^) and in ([9]), we know that 

(t){w) > (j){w’^) + {V4>{w’^),w — w^) + ^\\w — w^Wq, 

> (j){w) + {V(l){w),w^~^^ — w) + — w\\q. 

By taking {w,w') = in (fTOl) . we can also get that 

< ^{w^) + {V(Piw’^),w’^+^ - W^) + 


Putting the above three inequalities together, we get 

{V(t>{w^) - V4>{w),w^+^ -w)> - w\\l + ||u;"+' - w\\l) - ^\\w’^+^ - u;"|||+„. (26) 

Note that 


l(||y;fc + l _ y;fc||| _ \\yjk + l _ y; 11| _ \\yjk _ y,|||) _ + ^ - u’^), + ^ - u) 

— {Q 22 {v^^^ — — v) — (Q| 2 (^^^^ “ U^), — v) 

= - w^Wq - - ^^IIq - \\w^ - w\\q) - - w\ - w)) 

+ (Qi2(^^^’'’^ - - U) 

= —— u>||q + (Qi2('y*^^^ — u^), — u). 

Substituting (f26]l and ((271) into (f2^ and by the assumption ifTT]) . we can further obtain that 

+ q{v^~^^)) — {p{u) + q{v)) + — w, V(l){w)) + — u, Ax) + — v, Bx) 

_|_ 

< a{B*{v^^^ — u^),^*(tt^+^ — u)) — — w\\^q + — w^^|lDiag(x>i V 2 ) 

_j_^^fc+l _ Q^^[yk+^ — u)) — ((p^ -|- 5)(u^+i — ti^), — u) 

— {{V 2 + — v^), — v) — (tct)“^(x^+^ — x^, x^"*"^ — x) 

-(1 - T)a\\A*u^+^ + B*v^+^ - cf. 


( 28 ) 


Recall that for any ( in the same space and a self-adjoint positive semidefinite operator Q, it 
always holds that 

({.ec> = 5(ii{i + iiciil - K - ci) = f (ii« + ciii - iiJiil - iici). ( 29 ) 

Then we can get that 


{(I>1 + - u) = -(||i,*+> - - lla'' - a|||,.+5), 

{(% + r)(a‘+‘ - a‘), - a) = i(||a‘+> - - ||a* - a|||,,+^). 


^ — x) = 




{Q 22 {v^^^ - -v) = - /|||^^ + 11^;"+' - - ||r;" - v\\lJ. 

(i) Assume that r G (0,1]. By using the last equation in (l30l) . we can obtain that 


(30) 


Q\2{u^^^ — u)) = 


(( yk+l_xk ) > -W)^ - {Q 22 {v^^^ - - V) 


< l(||7;^+i - + ||u;^+^ - u;||^) - ^(||u^+^ - 


IQ22 


+\\v'^^-vrQ,,-\\v-nQ,,) 

= - ^'IIq + ^(11^" - HIL -11^"+' - v\\lj, 

(31) 

where the inequality is obtained by the Cauchy-Schwarz inequality. By some simple manipulations 
we can also see that 


a{B*{v'^+^ - v^),A*{u^+^ - u)) = |(||Xu^+^ + - cf - \\A*u^^^ + B*v’^ - cf) 


a, 


+^{\\A*u + B*v^ - cf - \\A*u + - cf). 


(32) 


Finally, by substituting PU]) . pT]) and (152]) into (]25|) and recalling the definition of <l>fc+i(-, •, •) and 
0fc-)_i in (fTS]) and (fTOll . we have that 

{p{u^^^) + q{v^^^)) — {p{u) + q{v)) -|- — w, V4>{w)) + — u, Ax) + — v, Bx) 

- x,A*u + B*v - c) + ^{^k+iiu,v,x) - ^kiu,v,x)) 

< - «^lll + - v'^Wt + - M\q + - '^IIq + <x\\A*u’^^^ + B*v^ - c|p 

+(1 — T)a\\A*u^^^ + — c|p) 

< -^(0fc+i + (j\\A*u^+^ + B*v^ - cf + (1 - T)a\\A*u^+^ + B*v^+^ - cf), 


where the last inequality comes from the fact that 


l|k‘+' -Mlh + h’^- ^fa > - “’‘lie- 
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This completes the proof of part (i). 

(ii) Assume that r > 0. In this part, we first reformulate (I28h as 

— {p{u) + q{v)) + — w, V(l){w)) + — u, Ax) + — v, Bx) 

_(^^k + l _ _|_ 

< a{B*{v^+^ - - c) + ^{\\A*u + B*v^ - cf - ||- cf) 


1 , 


1 , 


“2"'^ llo-BB* “ 11^ “^IIS+ 2ll^ lloiagpi.Da) “ -V ),V 

_|_^^fc+l _ yk^ Q*^^yk+1 _ ^) _|_ Q22{v^^^ — v)) — {{Vi + — ti^), — u) 

— ((P2 + T)(u^+^ — — v) — (r( 7 )“^(x^+^ — — x) 

— (1 — r)cT|| — c|p. 


(33) 


(34) 


Next we shall estimate the following cross term 

a{B*{v^+^ - v^),A*u’^+^ + B*v’^+^ - c) + {v^+^ - v\ - u) + Q 22 {v^^^ - v)). 

It follows from (I23p that 

r 136^+1 - v^cpi'w'^) - (Q 22 + V 2 + r)(u^+i - u^) - - yf") g dq{v^+^), 

1 Bb'^ - V^cPiw^-^) - (Q 22 +V 2 + T){v^ - v^-^) - Ql 2 iu^ - u^-^) e dq{v^). 

Since V(/> is globally Lipschitz continuous, it is known from Clarke’s Mean-Value Theorem [21 
Proposition 2.6.5] that there exists a self-adjoint and positive semidefinite operator 
£ conv{d‘^(j){[w^~^,w^])} such that 

where the set conv{d^(l)[w^~^,w^]} denotes the convex hull of all points W G d^(l){z) for any 

( yvifc y^k \ 

)) where :U^U, W 22 : V —> V are self-adjoint 
\yv 12 ) kV22/ 

positive semidefinite operators and >Vf 2 : ^ V is a linear operator. Substituting (l3H) into (fT7)l 
at u = and v = v^, we obtain that 

{B{b^^^ — — v^) — {Q22 {v^~^^ — v^) + Qi2{y^^^ — tt^), — v^) 

> {Vycj){w'^) - V^0(u;^-i), - v’^) - ((Q 22 +V 2 + r){v^ - u^-i), - v^) 


-t-||.yfc+l _ y. 


k\\2 


\T+V2 


— {u^ — ^, Quiv^^^ — v^)) 


= {u^ - (Wfa - Qi2)(^;'=+^ - V^)) - ((Q22 +V2 + T- W^2){y'" - v'^-^),v^+^ - v’^) 




k\\2 


\T+V2 


1 


> -f (11“'= - “‘^'IIp. +11“'=+' - “‘Ilk) - 5(ll“‘+' - “‘llf+p= + ll“‘ - “‘^'IIt+p,) 


i - u'' 


\T+V2 


= l||“‘+‘ - “‘iik(i-,,p, -1||“‘ - “‘-‘nl+p. - ^ii“‘ - “‘-‘Ilk. 
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where the second inequality is obtained from (| 1 ‘ 2 I) and the fact that W ’22 ^ Q 22 - Therefore, with 
fik+i = (1 “ — v^),A*u^ + I3*v^ — c), the cross term can be estimated as 

“ c) + — u) + Q22{v^^^ — — v^) 

= (1 - T)a{B*{v^+^ - v’^),A*u^ + B*v^ - c) - {B*{v^+^ - v^), - h^) 

+ {Q\2{u^ -u) + Q22{v^ - v),V^+^ - V^) + {Ql2iu’‘^^ - U^) + Q22{v^^^ - - V^) 

^ k „,,||2 , ||„, fc+l „, fc ||2 \ ^| U , fc+l ,, fc ||2 1 ' 

- > - 2\\^ \\T+{l-ri)V2 


< /rfc+i + -(||r« -n;||Q + ||n^ -^'llS22 


+ -p V |ir+x>2 




Finally, by the Cauchy-Schwarz inequality we know that 
1 

Aifc+i < < 


-(1 - T)a{\\B*{v^+^ - + \\A*u^ + B*v^ - cf), r e ( 0 , 1 ], 


-(r - l)a{T\\B*{v’^+^ - v’‘)f + t-^\\A*u^ + B*v'^ - cf), r > 1 . 


(35) 


(36) 


Substituting (l30l) . (I35|) and (f36]l into (1331) . we can obtain (f2n) . This completes the proof of part 
(ii). □ 


4 Convergence analysis 

With all the preparations given in the previous sections, we can now discuss the main convergence 
results of our paper. 

4.1 The global convergence 

First we prove that under mild conditions, the iteration sequence {{u^,v^,x^)} generated by the 
majorized ADMM with r G (0, ) converges to an optimal solution of problem ([T]) and its dual. 

Let u) = (h, h) G Z// X V be an optimal solution of ([I|) and x G <T be the corresponding optimal 
multiplier. For fc = 0,l,2,---, define 

«e = — U, Vg = — V, — W, Xg = — X. 


Theorem 4.1 Suppose that the solution set of (0]) is nonempty and Assumvtion \2.l\ holds. Assume 
that S and T are chosen such that 

Qn + aAA* +5^0, Q 22 + (tBB* + T ^ 0 . 

(i) Assume that r G (0,1]. If for any w=^^^€UxV, it holds that 

{w, [Q + Diag (5 + (1 — T)aAA*,T + (1 — T)aBB*)]w) = 0 ^ ||rt|| ||x|| = 0, (37) 
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then the generated sequence {{u^,v^)} converges to an optimal solution of HP and converges 
to the eorresponding optimal multiplier. 

(ii) Assume that r G (0, ) ■ Under the conditions that 


M ■.= -Q + Diag{S - rjT>i, T - r}V 2 ) >i 0 , 

+ 5 + crAA* - rfDi >- 0, ^0-22 + T + aBB* - 77 P 2 0 


(38) 


and for any w = 


u 


xV, it holds that 


{w, [M. + aDiag{AA*,BB*)\w) = 0 =► ||«|||| 7 ;|| = 0 , 


(39) 


the generated sequence {{u^,v^)} converges to an optimal solution of (QP and {x^} converges to the 
corresponding optimal multiplier. 

Proof, (i) Let r G (0,1]. By letting {u,v,x) = {u,v,x) in inequality (f20]l and the optimality 
condition (JH]), we can obtain that for any k >0, 


^k+i{u,v,x) - ^kiu,v,x) 

< -( 0 fe+i + a\\A*u'^+^ + B*v^ - cf + (1 - T)a\\A*u^+^ + B*v^+^ - cf). 


(40) 


The above inequality shows that {<i>fc+i(n, n, x)} is bounded, which implies that {||3:^^^||}, IIsIj 

and {\\vl-^HQ 22 +aBB *+ 7 -} are all bounded. From the positive definiteness of Q 22 +crBB*+ 
T, we can see that {||ne'''^||} is bounded. By using the inequalities 

\\A*u^+^\\ < \\A*u’^+^ + B*v’^+^\\ + 

< ro-(||x^+^|| + Ijx^ll) + ||B*n^+^||, 

ll'We+^Qll < lke+^llQ + ll^^e+^llQ22, 

we know that the sequence {||'We'''^llo-. 4 . 4 *+Sii} is also bounded. Therefore, {||^te'''^llsii+o-. 4 . 4 *+ 5 } is 
bounded. By the positive definiteness of Qn +crAA* +5, we know that {||rte^^||} is bounded. On 
the whole, the sequence {{u^,v^,x^)} is bounded. Thus, there exists a subsequence {{u^% x^*)} 

converging to a cluster point, say (u°°, x°°). Next we will prove that (n°°,n°°) is optimal to 

([I]) and x°° is the corresponding optimal multiplier. The inequality MOp also implies that 

lim ( 0 fc+i + a\\A*u’^+^ + B*v’^ - cf + (1 - T)a\\A*u’^+^ + B*v’^+^ - cf) = 0 , 

fc^OO 


which is equivalent to 


lim \\A*u'^^^ + B*v^ - c\\ =0, 

k^oo 

lim (1 - r)\\A*u’^+^ + B*v^+^ - c|| = 0, 

k^oo 

lim — '»^^||Q+Diag(5,r) = 0- 


( 41 ) 
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For r G (0,1), since lim \\A*u^~^^ + — c|| = 0, by using (HTI) we see that 


/c—)-oo 


lim - u^)|| < lim i\\A*u^+^ + B*v^ - c|| + \\A*u^ + B*v’^ - c||) = 0, 


k^oo 


k^oo 


lim ||^*(u^+i-/)|| < lim(P*u'=+^+^*u^+i-c|| + P*u^+^+B*u^-c||) =0, 


k^oo 


k^oo 


which implies lim ||Q+Diag(<s+(T>t>t*,T+o-BB*) = 0- Therefore, for r G (0,1], we know 


that ^lim “'*^^llQ+Diag(5+(i-T)(T^>l*,T+(i-T)(TBB*) = 0- By condition (l371) we can see that this 

implies either lim ~ = 0 or lim ~ '^^11 = 0. Without loss of generality we assume 

fc—)-oo k^oo 

that lim — v^\\ = 0. Thus, 

k^oo 

lim ||X(u^+^-u^)|| < lim(P*u'=+^+^*u^-c|| + ||Xu^ + 5*u^-i-c|| + ||^*(u^-/-^)||) 

fc—)-oo k^oo 


= 0 , 


lim ||u^+^ - 

fc—)-oo 


< lim (||u;''+^ - w'^Wq + ||u''+^ - v^Wq^^) = 0. 

fc^OO 


(42) 

Therefore, lim — u^Wq^^^s+^AA* = 0. This implies lim “ '*^^11 = 0 by the positive 

fc^CO fc^OO 

definiteness of Qn + 5 + aAA*. 

Now taking limits on both sides of (I22p along the subsequence {{u^^, x^^)}, and by using 

the closedness of the graphs of dp, dq and the continuity of Vcj), we obtain 

( 0 G F(ti°°,u°°,x°°), 

1 A*u°° + B*v°^ = c. 

This indicates that {u°°,v°°) is an optimal solution to ([T]) and x°° is the corresponding optimal 
multiplier. Since (u°°, u°°, x°°) satisfies m, all the above arguments involving (u, v, x) can be 
replaced by {u°°,v°°, x°°). Thus the subsequence {<hfc. (u°°,u°°,a;°°)} converges to 0 as A:* —)• oo. 
Since {$fc.(u°°, u°°, x°°)} is non-increasing, we obtain that 


lim $fc+i(u°°,u°°,a:°°) = lim (ru) 
k—^oo k^oo 


-111 fc+l .^00112 I |L,fc+l i,oo||2 I ||,,fc+l 

llx - X \\ + \\V -V ||o.ge*+7-+Q22 + 11^ 


OO ||2 

<S 


_l_||r^fc+i _ w°°\\q = 0. 


(43) 

From this we can immediately get lim = x°° and lim = u°°. Similar to inequality (|42p 

fc —)-00 fc^OO 


we have that lim (t||.A*(u^^ — u°°)||=0and lim \\u^^ — = 0, which, together with 

fc—)-oo fc—)-oo 

imply that lim = 0 by the positive definiteness of Qn -|- 5 -|- aAA*. Therefore, the 

k—^oo 

whole sequence {{u^,v^,x^)} converges to (n°°,u°°,x°°), the unique limit of the sequence. This 
completes the proof for the first case. 


(ii) From the inequality (I2ip and the optimality condition (I14p we know that for any A > 1, 
(Tfc+i(u, V, x) + Hfc+i) - (Tfc(u, V, x) + Hfc) 

< —(Ffc+i -|- min(l, 1 -|- — T)a\\A*u^^^ + B*v’^^^ — cp). 


(44) 
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By the assumptions r G (0, and Ad ^ 0, we can obtain that > 0 and min(l, 1+t~^—t) > 

0. Then both {Tfc_|_i('u,u,x)} and are bounded. Thus, by a similar approach to case 

(i), we see that the sequence {{u^,v^,x^)} is bounded. Therefore, there exists a subsequence 
{{u^\v^%x^^)} that converges to a cluster point, say {u°°, v°°, x°°). Next we will prove that 
is optimal to ([I]) and x°° is the corresponding optimal multiplier. The inequality (j44jl 
also implies that 


lim 

k^oo 

lim 

k—^oo 


„k+l 




fc+i 



M 


lim 

fc—)-oo 

= 0 , lim \\B{v’^+^ 
k—^oo 


+ - c|| = 0, 

-/)||= 0 . 


By the relationship 

lim -u^)|| 

k—^oo 


< lim - c|| + \\A*u’^ + B*v^ 

k—^'OO 

= 0 , 


+ \\B*{v^+^-v’^)\\) 


we can further get lim — w^\\j^j^Dia.g(aAA *= 0- Thus, by the condition (l3^ . we can 

k—¥oo ’ 

get that either lim ~ ^^^11 = 0 or lim — u^|| = 0. Again similar to case (i), we can 

k—^oo k^oo 

see that in fact both of them would hold by the positive definiteness of | Qn + 5 + crAlAl* — rjVi 
and |Q 22 + T + aBB* — i^V 22 - The remaining proof about the convergence of the whole sequence 
{(u^,u^,x^)} follows exactly the same as in case (i). This completes the proof for the second case. 

□ 


Remark 4.1 In Theorem }^ - 1\ for r G (0,1], a sufficient condition for the convergence is 

Q + Diag{S + (1 - t)(tAA\T+{ 1 - T)aBB*) >- 0, 
and for r G [1, ), a sufficient condition for the convergence is 

-^Q +Diag{S — r]Vi, T — r]T> 2 )hO, -Q + Diag{S + aAA* — riT>i,T + aBB* — r]T> 2 ) y 0. 

Remark 4.2 An interesting application of Theorem \4-l\ is for the linearly constrained convex op¬ 
timization problem with a quadratically coupled objective function of the form 

(j){w) = ^(u;, Qw) + f{u) + g{v), 

where Q-.UxV^UxV is a self-adjoint positive semidefinite linear operator, / : Zd —)• (— 00 , 00 ) 
and 5 : V —)■ (— 00 , 00 ) are two convex smooth functions with Lipschitz continuous gradients. In this 
case, there exist four self-adjoint positive semidefinite operators :U -^U and Sg : V —)■ V 

such that 

yCed'^f{u),ueU 

and 

'ICed^g{v),veV, 
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where d^f and d‘^g are defined in &■ Then by letting Q= Q + Diag (S j, in and Q + TL = 
Q + Diag{T,f, in [T0\) . we have g = 0 in This implies that ^ 0 always holds in 

Therefore, for r G (0, the conditions for the convergence can he equivalently written as 

Qii + Sj + iS + 0, 0,22 + + 7” + <jBS* y- 0 (45) 


and 

{w, [Q + DiagiTif + 5 + uAA*, Yig + T + aBB*)\w) = 0 =► ll^llll^ll = 0- 


A sufficient condition for ensuring {45) and (fh) to hold is 


0 + DiagiJ^f + S + aAA*,^g + T+aBB*) >- 0. 


(46) 

(47) 


If Q = 0 , i-e., if the objective function of the original problem m is separable, we will recover the 
convergence conditions given in m for a majorized ADMM with semi-proximal terms. 


4.2 The non-ergodic iteration complexity for general coupled objective func¬ 
tions 

In this section, we wiii present the non-ergodic iteration compiexity for the majorized ADMM in 
terms of the KKT optimaiity condition. 

Theorem 4.2 Suppose that the solution set of (0]) is nonempty and Assumvtion \2.1\ holds. Assume 
that one of the following conditions holds: 

(i) T ^ (0,1] and Oi := -Q + Diag (5 -|- (1 — T)aAA* ,T + (1 — T)aBB*) >- 0; 

(a) T € ( 0 , 44^), Diag{S — g'Di,T— g'D 2 ) h 0 and O 2 '■= -^Q-\-Diag{S -\-aAA* —gVi,T -|- 
aBB* - 77 D 2 ) ^ 0. 

Then there exists a constant C only depending on the initial point and the optimal solution set such 
that the sequence {{u^generated by the majorized ADMM satisfies that for k>l, 

min {dist^{0,F{u^+\A+\x^+^)) + \\A*u^+^ + B*A+^ - cf} < C/k. (48) 

l<i<k 

Furthermore, for the limiting case we have that 

iim k{ min {disfi{0,F{u^+^,v^+^,x^+^)) + \\A*u^+^+B*v^+^ - cf}) = 0. (49) 

k^oo l<i<k 

Proof. From the optimaiity condition for , we know that 

/ —(1 — T)aA{A*u^^^ + — c) — (yAB*{v^ — — u^) + Qi2{v^^^ — v^) \ 

V -(1 - T)aB{A*u^+^ + B*v^+^ - c) - - v^) ) 

-{Q + U){w^+^ - w^) + V(j){w^+^) - V(j){w^) 
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Therefore, we can obtain that 

dist2(0, F{u^+^,x^+^)) + \\A*u^+^ + - cf 

< b\\aAB*{v^+^ - v>^)f + 5(1 - T)2a2(P||2 + \\Bf)\\A*u^+^ + B*v>^+^ - cf 

+5||(Q + - w^) - V(/)(u;^+^) + + 5||Qi2(u^+^ - v^)f + b\\T{v^+^ - v^)f 

+5||5(«^+i - u^)f + \\A*u^+^ + B*v^+^ - cf 

< 5a\\Af\\v^+^ - + (5(1 - r)V(P||2 + ||i3||2) + l)\\A*u^+^ + B*v>‘+^ - cf 

+5||+ 5||5||||n^+i - u% 

+5\\T\\\\v'^+^ - 

< Ci\\w^+^ - + C2 \\A*u^+^ + B*v^+^ - cp, 

(50) 

where 

Cl = 5max(<rMf, ||\/Cj^||, ||«||, ||S||, ||T||), ft = 5(1 - rf<,\\\Af + ||S|P) + 1, 

O = H + Diag {S,T + <rBB’ + v^Q^Qi^) 

and the second inequality comes from the fact that there exists some >V^ € conv{9^(/>([u)^“^, rc^])} 
si-icl(i tldctt 

IKQ + - w^) - V(j){w^+^) + V0(u;^)f 

= \\{Q + n-w^){w’‘+^< \\n\\\\w'^+^ - w^w^. 

Next we will estimate the upper bounds for and \\A*u^~^^ + B*v^~^^ — c|p by only 

involving the initial point and the optimal solution set under the two different conditions. 

First, assume condition (i) holds. For r G (0,1], by using (|40ll in the proof of Theorem 14.11 we 
have that for i > 1, 

||t^*+i _ ^ + B*v^ — c|p + (1 — T)a\\A*u^^^ + B*v^~^^ — c|p 

< ^i{u, V, x) - 4>i+i(u, V, x), 

which, implies that, 
k 

^(||u;*+i - ^^iliQ+Di,,g( 5 ,r) + + >3*v^ - cf + (1 - T)a\\A*u^+^ + B*A+^ - cf 

i=l 

< 4>i(u, V, x) - 4>fc+i(u, V, x) < 4>i(u, V, x). 

This shows that 

k 

X/11^lliQ+Diag(5,r) 

2 = 1 
k 

^ + B*v^ - cf < 4>i(u, V, x), (51) 

2 = 1 
k 

y^(l — T)a\\A*v!‘~^^ + — c|p < <hi(u,u, x). 
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From the above three inequalities we can also get that 




„i+l „i||2 


< (1 - r) ^{2a\\A*u^^^ + B*A - cf + 2a\\A*u^ + B*v^ - cf) 


i=l 


2=1 


< 2(2 - t)$i(u,u,x), 


(1 - t) ^ ||u*+^ - v^WIbb* < (1 - ^) - cf + 2a\\A*u^+^ + - cf) 

2=1 2=1 

< 2(2 — t)$i(u, V, x). 

With the notation of operator Oi we have that 


El 

i=l 


j +1 ,„*||2 _ 


W ' — W 


lOi 


El 


W l|iQ+Diag(5 


i=l 

< (9 — 4T)‘hi(h, V, x). 


,r) + 


y;*+l _ y;*l |2 


i=l 


{l—T)D\a.g(cTAA* ,BB*) ^ 52 ) 


If T G (0,1), we further have that 


^ — cf < (1 — r) ^<I>i(u, Ujx). 


2=1 


(53) 


If T = 1, by the condition that Oi = -Q + Diag (5,T) 0, we have that 

k k 

- cf < ^(2 ||Xm*+^ + B*v^ - cf + 2|f 


BB* 


2=1 


2 = 1 
k 


— C\\ 


(54) 


< ^( 2 p*u*+^ + i3*f 

2=1 

+2||OpDiag(0,^^*)C>^^||||u;*+i 

< (2 ct-i + (18 - 8T)\\O;"^Diag{0,BB*)O;f\)^i{u,v,x), 

where the second inequality is obtained by the fact that for any a self-adjoint positive definite 
operator Q with square root and a self-adjoint positive semidefinite operator G dehned in the 
same Hilbert space, it always holds that 

fll|= = {Gk,{G-"^GG-'fGk) 

<\\G-'^GG-'fmfg. 

Therefore, by using ([50l) . ([5^ and the positive definiteness of operator Oi, we know that 


< 


minJdistfO,F(M*+\u*+\x*+f) + ||Xu*+^ + - cf} 

k 

(^(distfO,F(u*+\u*+\a:*+f) + \\A*u^^^ + - cf))/k < C$i(h, h, x)/fc, 


2=1 
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where 


C = 


Ci(9-4T)||Oi^aOi 2|| + C2(l-r)-V-i, TG (0,1), 

Cl(9 - 4T)||Opaop II + C2(2a-i + (18 - 8r)||0”^Diag (0,||), r = 1. 

and m Lemma 2.1], we have 


To prove the limiting case (I49|) . by using inequalities 
that 

min ||u;*+^-u;*||^, = o{l/k), min ||.A*u*+^ + - cf = o(l/A:), 

l<i<A: l<i<fc 


which, together with (|50l) . imply that 

2-j-l i-|-l z-j-l 


lim A:( min {disL^(0, F(u*“''^, )) + ||^*u*^^ + “ c|f}) 

k—>‘Cio l<i<k 

< lim A:( minlCillO'^aopilllTU^+i-u;*||^, + C 2 \\A*u^+^ + B*v^+^ - cf}) = 0. 

k^oo l<i<k ^ 

Now, we complete the proof of the conclusions under condition (i). Next, we consider the case under 
condition (ii). The proof of this part is similar to the case under condition (i). For r G (0, 
let p(t) = min(T, 1 + r — r^). We know from (l38|) that for r G (0, and any k > I, 


||u;*+^ - w^*|| + B*A^^ - c|| 

2 = 1 


< (4'i(u,u,x) + Hi) - (Tfc+i(u,u,x) + Hfc+i) < ^i{u,v,x) + Hi. 

Thus by the positive semidefiniteness of + Diag (5 — ryPi, T — f/T> 2 ), we can get that 

k 

rQ+diag( 5 -» 7 X>i,r-»?X> 2 ) - + - 1 ) 




2=1 

k 


^||u*+^ -u*||2gg* < (Ti(u,u,x) + “i)/p(r), 

2=1 

k 

Yj o-||^*u®+^ + - c|p < r('Li(u, V, x) + Hi)/p(t), 


(55) 


2=1 

which, implies that, 

k k 

Yw^'^^ -^'wIaa* < 

< (6r + 2)(Ti(u,i;,x) + Hi)/p(r). 


^(3cj||Xu*+i + B*A+^ - cf + + B*A - cf + 3||u*+^ - 

+ "i)/p(r). 

(56) 


Combining (|55p and (|56p one can find that 


E 


i+l „,,b|2 _ 


E 


''lQ+dia.g{S-vVi,T-7jV2) 


+ E 


\w ' -w 1102 

< (1 + (6t + 3)/p(r))(Ti(u,i;,x) + Hi). 


j+l q|2 

1'*^ ^ llDiag(o-.4.4*,o-BB*) 


(57) 
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Therefore, by using (I^UD . (IS5D . (IF7D . and recalling the positive dehniteness of operator 02 5 we hnally 

ll. cL\^G t ll ctt 

mmJdist2(0,F(u*+\u*+\x*+^)) + \\A*u^+^ + B*v^+^ - cf} 

k 

< (^(dist2(0,F('«*+Sr;*+\x*+i)) + -cf)/A: 

< C"(Ti(u,'(;,x) + Hi)/A;, 

where C = 011102 ^C>C >2 ^ 11(1 + + 3)/p(r)) + C' 2 cr“^r/p(r). The limiting property ()i9]l can be 

derived in the same way as for the case under condition (i). This completes the proof of Theorem 

WM □ 

Remark 4.3 Theorem \4 -SI gives the non-ergodic complexity of the KKT optimality condition, which 
does not seem to he known even for the classic ADMM with separable objective functions. For the 
latter, Davis and Yin m provided related non-ergodic iteration complexity results for the primal 
feasibility and the objective functions and constructed an interesting example to show for any a > 
1/2, there exists an initial point such that the sequence {(tt^,u^)} generated by the classic ADMM 
(r = 1) satisfies ~ c|| > l/kA. This implies that the non-ergodic iteration 

complexity results presented in Theorem If.Sf in terms of the KKT optimality condition may be 
optimal. 


4.3 The ergodic iteration complexity for general coupled objective functions 

In this section, we will discuss the ergodic iteration complexity of the majorized ADMM for solving 
problem ([1]). For k = 1,2, ■ ■ ■ , denote 

k k k 

i=l 2 = 1 2=1 

and 

{ A — |L.A:+1||2 I |L,/c+1||2 i 1 m ^/c+l ||2 

J'-fc+l — \\Ue IIDi+S + iTe \\v2+T+Q22+(tBB* + II > 

Afc+i = Afc+i + Hfc +1 + II g + max(l - r, 1 - r“^)cT||M*u^+^ + - c|p. 

Theorem 4.3 Suppose that S and T are chosen such that 

Qu + (tAA* +5^0, Q 22 + (tBB* +TyO. 

Assume that either (a) r G (0,1] and holds or (b) r G (0,1-1^) and and hold. 
Then there exist constants Di, D 2 and D 3 that only depending on the initial point and the optimal 
solution set such that for A: > 1, the following conclusions hold: 

(^) 

\\A*u'^+ B*v^-c\\<Di/k. (58) 

(a) For any {u,v,x) G := {{u,v,x) G 2^ x V X X\\\{u,v,x) — (u^,x^)|| < 1}, 

{p{uk) + qivk)) - ip{u) + q{v)) + {w^ - w,V(p{w)) + {u’^ - u. Ax) + {v^ - v, Bx) 

(59) 

— (x^ — x,A*u + B*v — c) < D 2 /k. 


19 



(in) For case (b), if we further assume that S — rfDi F 0 and T — r]'D 2 F 0, then 

\9{u^,v^) — 9{u,v)\ < D^/k. (60) 

The inequality holds for case (a) without additional assumptions. 

Proof, (i) Under the conditions for case (a), the inequality (HHli indicates that {d>fc_|_i(ii, h, x)} is 
a non-increasing sequence, which implies that 

(rcr)“^||x^''“^ — x\\^ < <l)fc_|_i('U,h,x) < ^i{u,v,x). 

Similarly under the conditions for case (b), we can get from (I44p that 

(rcr)"^||x^+^ - xf < ^k+i{u,v,x) + Hfc+i < 4'i(u,h,x) -h Hi. 

Therefore, in terms of the ergodic primal feasibility, we have that 

k 

\\j*^k ^ _ ^||2 ^ II1 - c)f 

i=l 

k 

= “ a;*)||Vfc^ (61) 

= ||(Tcr)“^(x^'''^ — x^)|p//c^ 

< 2||(rcj)“^(x^’''^ — x)p//c^ + 2||(r(T)“^(x^ — x)|p/P < Cs/A;^, 

where for case (a), C 3 = 2 {Ta)~^^i{u, v, x)+ 2 ||(rcj)“^(x^—x)|p and for case (b), C 3 = 2 (rcj)“^('hi(u, v, x)+ 
Hi) + 2||(r(T)“^(x^ — x)|p. Then by taking the square root on inequality (f 6 T]i . we can obtain (|58]i . 


(ii) First, assume that the conditions for case (a) hold. Then the inequality (|20l) implies that for 

i > 1, 


+ g'(u®+^)) — {p{u) + q{v)) + — w,'V(j){w)) + (u*+^ — u,Ax) + — v,Bx) 

— (x*"*"^ — X, A*u + B*v — c) 

< -l(4>j+l(M, u, x) - $j(u, u, x)). 


Thus, summing up the above inequalities over i = 1, • • • ,k and by using the convexity of functions 
p and g, we can obtain that 


p{uk) + q{vk) — {p{u) + q{v)) + {w^ — w,V(j){w)) + {u^ — u, Ax) + {v^ — v, Bx) 
— (x^ — x,A*u + B*v — c) < (<hi(u,u,x) — ^k+i{u,v,x))/2k < ^i{u,v,x)/2k. 


(62) 


Next, we will provide an explicit bound of <hi(tt,u,x) for (u,v,x) G Bk that only depends on the 
initial point and the optimal solution set. Since {‘hi+i(u, h, x)} is non-increasing with z, for z > 1 
we have 


rcT 


I ^||x — 


+ \\u — u 


j+ll 


2 -L 

Vi+S + 


\v — V 


i+l||2 


I Q22+'C>2+T +aBB* 


1 , 

+ 2 ' 


W — W 


t+l| 


Q < $i(^x,z;. 
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Thus, summing up the above inequalities from i = 1 io k and by applying Jensen’s inequality to 
the convex function || • |p, we can get that 


1 


i=\ 

Recall that = x® + — c) = x® + r“^(x®+^ — x®). Then we also have 


. u — u ||x)i+<s + lb“'^ IIq 22 +I’ 2 +T+(tBB* + 2 11'“^“'^ IIq < a;)- (63) 

i=l 


k k 

i+l _ 


E 


X"' " = ^ X® + T ^ — X") = 


1 ) = ^ a:®+^ + (r-^ - l)(x'=+^ - x^), 


2=1 2 = 1 

which, implies that for /c > 1, 

1 . ^ 

||x — X^lP — ~ 

i=l 
k 


2=1 


/c +1 _ 1 \m |2 


T-^ - l)(x''+^ -X 


< 2||x - + 2(r"^ - l)^||i(x^+^ 


2=1 
k 

< 2||x - ^ X] “ l)^||x^+^ - xf + 4 (t”^ - l)^||x^ - xf 

2=1 

1 ^ 

< 2||x -^Yl V, x). 

2=1 


Therefore, by (IU5D and (IMIl . we can obtain that 
(t(t)“^||x — x^lp + ||iZ — il^||x)i +5 + \\v — v 




-L II _ ^ i’ II 9 

* + - w \\q 


x®+^|p + ||u - + ||n - v’"\\Y+V 2 +T+aBB* + g 11^ “ 


< 2(rcj) ^\\x-^Y 

2=1 

+8(r“^ — l)^<f>i(tt, x, x) 

< (2 + 8(t“^ - l)^)<l>i(ti,i;,x). 

In addition, we also know from (jbip that for A: > 1, 

\\A*u^ + B*v^ - cf < Cz/P < Cs. 


(64) 


Q (65) 


( 66 ) 
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Now putting inequalities (1551) and (1551) together, we see that for any {u,v,x) G 


< 


< 


(ra) - xf + \\u^ - + ||u^ - + ^11^^ “ '^IIq + <y\\-^*u + B*v^ 

- x) + {x- x^) + [x^ - x)|p + ||(u^ -u) + {u- u^) + [u^ - tt )|||,^+5 

+ 11(^1 - ^) + (^ - v^) + (i)^ - + Iwi^^ -w) + {rv- w^) + {w^ - u;)||| 

+a\\A*{u - u^) + B*{v^ -v)+B*{v- v^) + {A*u^ + B^’^ - c)f 

3[(rcj)-i||xi - xf + \\u^ - uWl^^g + ||ni - h|||^^^2?2+r+2aBB* + “ ^Hq] 

+3[(t(t) ^\\x - x^f + \\u - u^|||)i+s + 11^ “ '^^lls22+X'2+r+(TBB* + “ "'^^IIq] 

+3[(ru)-i||x^ - + ll«^ - «ll|)i+s+2a^^* + “ vfa^^+v^+T + “ '^Wq} 

+3\\A*u'^ + B*!)’^ - cf 

6<hi(u, V, x) + 3(2 + 8(r“^ — l)^)<hi(u, v, x) + 3(74 + 3(73, 


— c 


2 


where C4 = max((T(T) ||^Q + Diag {Vi + 5 + 2c77l7l*, Q22 +^>2 + T)||). Then we can get that 
for any {u,v,x) G B^, ^i{u,v,x) is bounded by a positive constant 

= (12 + 24(t ^ — l)^)<h 4 (u, V, x) + 3(74 T 3(73- 

This, together with ([62]), implies that 

{p{u^) + qi'v’^)) — {p{u) + q{v)) + {w^ — w,V4>{w)) + {u^ — u, Ax) + (h^ — v, Bx) 

— {x^ — X, A*u + B*v — c) < C^jlk. 


Under the conditions for case (b), by a very similar approach we can get an upper bound for 
(Ti(u,n,x) + Hi) such that 

{p{u^) + q{vf) — {p{u) + q{v)) + {w^ — w, V(t>{w)) + {u^ — u, Ax) + {v^ — v, Bx) 

— {Sf — x,A*u + B*v — c) < {'^i{u,v,x) + Hi)/2A: < C^/2k. 

The above arguments show that for both cases, the property (1591) holds. 

(iii) For the complexity of primal objective functions, first, we know from (1131) that 

p{u) > p{u) + {—Ax — Vu4>{w),u — u) Vu G U, 
q{v) > q{v) + {—Bx — Vy4>{iB),v — v) Vu G V. 

Therefore, summing them up and by noting A*u + B*v = c and the convexity of function (j), we 
have that 

6{u, v) — 9{u, v) > —{x, A*u + B*v — c) + (piw) — 4>{'w) — (V4>{w), w — w) 

>—{x,A*u + B*v — c) VuGZ^,uGV. 
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Thus, with {u,v) = it holds that 


9{u^,v^) — 0 {u,v) > —{x^A*u^ + B*v^ — c) 

> -lilm" + - cf) (67) 

where C 3 is the same constant as in (1611) . 

For the reverse part, by ([9]) and (1101) we can obtain that for any i > 1, 

< 4>{w'‘) + (V0(r(;*),rc*+^ - tc*) + - w'\[q+'H^ 

4>{w) > + {V4>iw^),w — re*) + ^||ni — w^W'q, 

which, indicate, that 

(()(r(;*+^) - (l){w) < (V(/)(t(;*),rc*+^ - w) + - i«1q+w - ]^\W “ ^IIq- (68) 

Thus, (IMl) and (l68]l imply that for r € (0,1] and any i > 1, 

— 9{u,v) 

< - ^\\w^ - wfQ + {w- w^+\ Q{w^+^ - w^)) + {u - 

+{v-v^+\Bx^+^) - -v\Ql 2 {u-u^+^))+a{A*{u^+^-u),B*{v^+^-A)) 

+ {u - u^+\ (Pi + 5)(n*+i - n*)) + (n - v^+\ (P 2 + r){A+^ - v^)) (gg) 

< i(A, - A,+i) - i(||u*+i - + ||u*+' - v% + + B*A - cf 

+a{l - t)\\A*u^+^ + B*v^+^ - cf) 

— 2^"^* ~ -^i+l)- 

Therefore, summing up the above inequalities over i = 1, - ■ ■ k and by using the convexity of function 
9 we can obtain that 

S{uk,Vk) - 0{u, v) < (Ai(ii, V, x) - Afc+i(u, V, x))/2k < Ki/2k. (70) 

The inequalities ([67)1 and (fTOl) indicate that (l60l) holds for case (a). 

Next, assume that the conditions for case (b) hold. Similar to (I69p . we have that 

0(n*’''^, n*+^) — 9{u,v) 

— 2 ^^* ~ ^i+l) ~ ~ + 11^^*"'"^ — '^*llr+min(T,l+T-T2)(7BB*-r)X'2 

+ min(l, 1 + T~^ — T)a\\A*u^^^ + B*v'^^^ — c|p). 
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By the assumptions that S — rfDi ^ 0 and T — r]'D 2 ^ 0, we can obtain that 


( 71 ) 

□ 


e{u’^,v'^)-e{u,v) < (Ai - Afc+i)/2/t < Ai/2k. 
Thus, by (|70l) and ([7T]) we can obtain the inequality (l60|) . 

Below we make a couple of remarks about the results in Theorem 14.31 


Remark 4.4 The result in part (ii) can be regarded as an ergodic version to (48) on the KKT 


optimality condition, though less explicit. Note that if one takes the square root on (JS), the right 
hand side will become 0{l/y/k). The inequality < 1591) indicates that the majorized ADMM requires 
no more than 0{l/e) iterations to obtain an e-approximation solution in the sense of <f75l) . When 
the objective function is separable, this kind of results has been studied for the (proximal) classic 
ADMM with separable objective functions, for examples, and in a recent work by Li et 

al. m for a majorized ADMM with indefinite proximal terms. 


Remark 4.5 The results in parts (i) and (Hi), which are on the ergodic complexity of the primal 
feasibility and the objective function, respectively, are extended from the work of Davis and Yin m 
on the classic ADMM with separable objective functions. These results are more explicit than the one 
in part (ii). However, there is no corresponding result available on the dual problem. Therefore, it 
will be very interesting to see if one can develop a more explicit ergodic complexity result containing 
all the three parts in the KKT condition. 


5 Conclusions 

In this paper, we establish the convergence properties for the majorized ADMM with a large 
step length to solve linearly constrained convex programming whose objective function includes a 
coupled smooth function. From Theorem 14.11 one can see the influence of the coupled objective 
on the convergence condition. For r G (0, a joint condition like ((37)) or (f39l) is needed to 

analyze the behaviour of the iteration sequence. One can further observe that the parameter rj, 
which controls the off-diagonal term of the generalized Hessian, also affects the choice of proximal 
operators S and T. However, as is pointed out in Remark 14.21 when the coupled function is convex 
quadratic, p = 0 and the corresponding influence would disappear. Although, in this paper we focus 
on the 2-block case, it is not hard to see that, with the help of the Schur complement technique 
introduced in m, one can apply our majorized ADMM to solve large scale convex optimization 
problems with many smooth blocks. 

Acknowledgements. The authors would like to thank Dr. Caihua Chen at the Nanjing 
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References 

[1] Chen, C., He, B., Ye, Y. and Yuan, X. (2014). The direct extension of ADMM for multi-block 
convex minimization problems is not necessarily convergent. Mathematical Programming, 
Series A, DOI 10.1007/sl0107-014-0826-5. 


24 





[2] Clarke, F, H. (1990). Optimization and Nonsmooth Analysis, 2nd edition, Classics in Applied 
Mathematics, vol. 5, Society for Industrial and Applied Mathematics, Philadelphia. 

[3] Davis, D. and Yin, W. (2014). Convergence rate analysis of several splitting schemes, 
arXiv: 1406.4834, 

[4] Eckstein, J. and Bertsekas, D. P. (1992). On the Douglas-Rachford splitting method and 
the proximal point algorithm for maximal monotone operators. Mathematical Programming, 
55(1-3), 293-318. 

[5] Eckstein, J. and Yao, W. (2014). Understanding the convergence of the alternating direc¬ 
tion method of multipliers: Theoretical and computational perspectives, RUTCOR Research 
Report. 

[6] Gabay, D. (1983). Applications of the method of multipliers to variational inequalities, in 
Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value 
Problems, M. Fortin and R. Glowinski, eds., vol. 15 of Studies in Mathematics and Its Ap¬ 
plications, Elsevier, pp. 299-331 

[7] Gabay, D. and Mercier, B. (1976). A dual algorithm for the solution of nonlinear variational 
problems via finite element approximation. Computers and Mathematics with Applications, 
2(1), 17-40. 

[8] Glowinski, R. (1980). Lectures on numerical methods for nonlinear variational problems, 
vol. 65 of Tata Institute of Fundamental Research Lectures on Mathematics and Physics, 
Tata Institute of Fundamental Research, Bombay, Notes by M. G. Vijayasundaram and M. 
Adimurthi. 

[9] Glowinski, R. and Marroco, A. (1975). Sur I’approximation, par elements finis d’ordre un, et 
la resolution, par penalisation-dualit’e, d’une classe de problemes de Dirichlet non lineares. 
Revue Francaise d’Automatique, Informatique et Recherche Op’erationelle. 9 (R-2), pp. 41- 
76. 

[10] He, B. and Yuan, X. (2012). On the 0(l/n) Convergence Rate of the Douglas-Rachford 
Alternating Direction Method, SIAM Journal on Numerical Analysis, 50(2), 700-709. 

[11] Hiriart-Urruty, J. B., Strodiot, J. J. and Nguyen, V. H. (1984). Generalized Hessian matrix 
and second-order optimality conditions for problems with data. Applied mathematics 
and optimization, 11(1), 43-56. 

[12] Hong, M., Chang, T. H., Wang, X., Razaviyayn, M., Ma, S. and Luo, Z. Q. (2014). A 
Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained 
Convex Optimization, arXiv:1401.7079, 

[13] Li, M., Sun, D. and Toh, K. C. (2014). A Majorized ADMM with Indefinite Proximal Terms 
for Linearly Constrained Convex Composite Optimization, arXiv:1412.1911. 

[14] Li, X. D., Sun, D. F. and Toh, K. C. (2014). A Schur Complement Based Semi-Proximal 
ADMM for Convex Quadratic Conic Programming and Extensions, Mathematical Program¬ 
ming, Series A, DOI 10.1007/sl0107-014-0850-5. 


25 


[15] Monteiro, R. D. and Svaiter, B. F. (2013). Iteration-complexity of block-decomposition algo¬ 
rithms and the alternating direction method of multipliers, SIAM Journal on Optimization, 
23(1), 475-507. 

[16] Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Mathematical 
Programming, Series B, 140(1), 125-161. 


26 


